Attomtty Oock«t No.: 8600-0138.3^ 

Tranamhtal of Uf\\\f^ patant AnnHc atlon for FHInfl 
CmmcatiM Undmr 37 Cf.R %UtO0f •ppUGmtiti 



EM 383 301 71 6 US 

"Exprass Maii" LaM Number 



Oat« of Deposit 



I h«r«bv Mrtify that thla Tranamlttal lattar. anoloaad appieatfcm. and any othar doeumanta rafarrad 
to aa anobaad harain ara baing dapoahad in an anvabpa wtth tha Unltad Stataa Poaul Sarvioa 
-Expraaa Mai Poat Offica to Addraaaaa" aarvioa undar 37 CFR ll.lOon tha data Mloatad abova 
and addraaaad to tha Aaaiatant Convntealonaf tor Patonta, Waahkigton. O.C. 20231 



JAmrrs Cabautan 



(Print Nana «f Pmn Mailino Aopikadonl 



JSgmOn o« PvMn Mailng A p aii c ticii t 



The present inventiopAis a continuation-in-part of 
.S. patent applicatioiTSeriai^No. 08/514,875 for "Method 



and Gene-Array Device for Analyzing Gene Expression 
Patterns", filed August 14, 1995, which is a continua- 
tion-in-part of U.S. patent application Serial No. 
08/477,809 for "Method and Apparatus for Fabricating 

5-^4?^^?^"^^ Biological Samples", filed June 7, 1995^t^c«i us V>o,i 
^Vhich is -a continuation-in-part of U.S. patent applica- 
tion Serial No. 08/261,388 for "Method and Apparatus for 
Fabricating f^^^^^^^^^^^^^^o^Lo^^ Samples", filed 
June 17, 1994^. These tehroo a pplications are incorporated 
herein by reference. 



Field of tha Invention 

This invention relates to a method for determining 
the relative abundances of a plurality of polynucleotide 
sequences . 

References 

Adams, M.D., at al., Scimnce 252 ;1651 (1991). 

Adams, M.D., et al., Natur 31^:632 (1992). 

Ausubel, F.M., et al., Eds., in Current Protocols in 
Molecular Biology^ Greene & Wiley Interscience, New York, NY 
(1994) . 



4 



m. 



Baeuerle, P.A. , et al., Annu. Rev. Immunol. I2:i4i 
(1994) . 

Becker, j., and craig, e.A. , Euro. J. Biochem. 
lli:ll (1994). 

5 Bohlander, etal,, Genomics X2.i 1222 (1992). 

Chan, A.C., et al., Annu. Rev. Immunol. 12:555 
(1994) . 

Cohen, G.B., et al.. Cell 1^:237 (1995). 
Collins, F.S., Proc. Natl. Acad, scl, USA 22:10821 
10 (1995). 

Crabtree, G.R. , et al., Annu. Rev. Biochem. 6l:i045 
(1994) . 

Craig, E.A., et al., cell 78:365 (1994). 
Cyr, D.M. , Trends Biochem. Sci. I2:i76 (1994). 
15 Jakob, u., et al.. Trends Biochem. Sci. 1^:205 

(1994) . 

Jindal, s.. Trends Biotechnol. il:i7 (1996). 

Lehrach, H. , et al., "Hybridization Fingerprinting 

in Genome Mapping and Sequencing, in Genome Awat.vst« vnr mn:. 

2° GENETTP AND ppysiCAT, MftpPTwr, (Davies, K.E., and Tilgham, 

S.M., Eds.) Cold Spring Harbor Laboratory Press, (Cold 

Spring Harbor, NY) pp. 39-80 (1990). 

Liou H.-C, et al., curr. Op. Cell Biol. 1:477 
(1993) . 

25 Mullis, K.B., et al., U.S. Patent No. 4,683,195, 

issued 28 July 1987. 

Nelson, S.F. , et al., Nature Genetics 4:11-17 
(1993) . 

Newton, A.C. , j. Biol. Chem. 22fl:28495 (1995). 
30 Nishizuka, Y. , FASEB J. 9:484 (1995). 

Riles, et al. , Genetics 114:81 (1993). 

Rohan, P.J., et al., sci nc 2S£:1763 (1993). 

Sambrook, J., et al., in MoLEcmAR crowTwr.. a Labqratopy 
m^m., second Edition, cold Spring Harbor Laboratory 
35 (Cold Spring Harbor, NY) (1989). 



Schena, M. , t al. , Sclenc 270 ;467-a7q (1995). 
Shalon, D., thesis, Stanford University (1995). 
Thanos, D. , et al.. Cell Ifl:529 (1995). 
Wilkinson, K.D. , Annu. Rev, mtr, 1£:161 (1995). 

5 

Background of the invention 

With a large number of human genes now identified 
through the Human Genome Project, there is great interest 
in finding out how these genes act in concert to regulate 
10 the whole organism (Watson, Collins, 1995; Adams, et al., 
1991, 1992; Cohen, et al. , 1995; Chan, et al., 1994; 
Crabtree, et al., 1994). 

The current focus of this research is in monitoring 
the genes' activities, i.e., levels of expression, as a 
15 function of cell type, cell condition, disease state, or 
drug therapy response. The general requirements of this 
type are severalfold: 

First, the method must be able to handle large 
niuabers of genes at once. Ideally, all or nearly all 
20 expressed genes from a given cell type may be repre- 
^ sented. 

'if Secondly, the method should be responsive to rela- 

^ tively small, as well as to large, changes in the levels 

of expression. For example, in monitoring the response 
25 of genes in a cell exposed to a given drug, it may be 
important to detect slight (Ganges in levels of expres- 
^ sion of genes, i.e., a7«*^S§l^ increase or decrease in 

expression level, in order to identify drug-responsive 
genes. As another example, in studying the response of 
30 genes in a given disease state, e.g., tumor state, it may 
be important, in understanding the relationship between 
g ne expression and disease state, to classify genes 
according to low, moderate, and high shifts in gene 
expression. More generally, the greater resolution that 
35 can be achieved, in terms of relative levels of expres- 



4 



sion, the more information that can be obtained from the 
studies. 

Finally, the method should be amenable to small 
amounts of polynucleotide sample material, to obviate the 
need for amplification of sample material, with the 
attendant possibility of differential amplification. 

A variety of methods for studying changes in gene 
expression have been proposed heretofore. One approach 
is based on differential hybridization between nucleic 
acid fractions from control and test sources. After 
hybridization of cDNA's from the two sources, those 
species "equally expressed" can be removed as DNA 

hybrids, leaving overexpressed or underexpressed cDNA's 
in single-stranded form. The single-stranded species can 
then be further characterized, e.g., by electrophoretic 
fractionation. 

This approach has been useful as a starting point 
for isolation or characterization of polynucleotides of 
interest in a polynucleotide mixture,, but is not suited 
to following small changes in expression levels, or 
tracking changes in a relatively large number of genes. 

Another approach for comparing the relative abun- 
dances of polynucleotides in mixtures involves hybridi- 
zing individual labeled probe mixtures with different 
replicate filters containing immobilized gene sequences, 
e.g., colonies of cloned DNA. Colonies on the replicate 
filters which hybridize differentially to the two probe 
mixtures then represent gene sequences which are present 
in greater or lesser abundance in the two probe mixtures. 

Because this method relies on comparing the amounts 
of label at corresponding positions on two different 
substrates (or filters) , the resolution of the method is 
limited by (i) variations in the amount of immobilized 
piffi at corresponding array positions on the two different 




(ii) variations in the extent of hybridization 



that occurs at corresponding array position, e.g., due to 
differences in probe accessibility to the immobilized 
DNA, and (iii) variations in measured reporter levels at 
corresponding array positions. To improve the resolution 
appreciably, one would have to average out these varia- 
tions by conducting many hybridization measurements in 
parallel. 

It would thus be useful to provide, for measuring 
the relative abundances of polynucleotides in complex 
mixtures, an improved method that avoids or overcomes the 
limitations above. In particular, the method should be 
adaptable to measuring the relative -^^^E^^ copy 
numbers or expression levels in a very large number 
(e.g., 50,000 to 100,000) of genes, at a high resolution, 
e.g., two-fold change in relative abundance, and at high 
sensitivity for rare genes. At the same time, the method 
should be simple to carry out. 

UMiniinn^ ot the Invention 

The present invention is directed to a method of 
determining the relative amounts of individual poly- 
nucleotides in a complex mixture of different-sequence 
polynucleotides. The method includes labeling the 
polynucleotides with a fluorescent reporter, and con- 
tacting the labeled polynucleotides, under hybridization 
conditions, with an array of different DNA sequences 
disposed at discrete ^©^feions' a non-porous surface, at 
a density of at least about 100 sequences/cm^ The 
different DNA sequences in the array are each present in 
multiple copies, and are effective to hybridize to 
individual polynucleotides in the mixture. The level of 
fluorescence at each position in the microarray is then 
determined. 

The labeled polynucleotides are preferably contacted 
with the microarray by covering the array surface with a 



6 

solution of the mixture of labeled polynucleotides, to a 
solution depth of less than 500 microns. 

In various preferred embodiments, the density of 
array elements corresponding to different-sequence DNA 
locations in the array is at least l,000/cm2, the DNA 
sequences in the array are at least about 50 bases in 
length, and the labeled polynucleotides represent at 
least 1 million unique base sequences. 

The method may be used in determining the relative 
amounts of each polynucleotide from first and second 
different sources. Here (i) the polynucleotides from the 
first and second sources are labeled with independently 
detectable first and second fluorescent reporters, 
respectively, (ii) the contacting of labeled polynucleo- 
tides from first and second sources is carried out 
simultaneously under competitive hybridization condi- 
tions, and (iii) the determining step includes measuring 
the levels of the two reporters at each position in the 
array. 

These and other objects and features of the inven- 
tion will become more fully apparent when the following 
detailed description of the invention is read in conjunc- 
tion with the accompanying figures. 

\, Brief Descriptio n of the Dravinas 
Fig. 1 shows a portion of a two-dimensional micro- 
array of different DNA sequences used in practicing the 
method of the invention; 

Fig. 2 shows a fluorescent image of an actual 20 x 
20 array of 400 f luorescently-labeled DNA samples immobi- 
lized on a poly-l-lysine coated slide, where the total 
area covered by the 400 element array is 16 squar 
millimeters; 

Fig. 3 A is a fluorescent image of a 1.8 cm x 1.8 cm 
microarray containing lambda clones with yeast inserts. 



7 

the fluorescent signal arising from the hybridization to 
the array with approximately half the yeast genome 
labeled with a green fluorophore and the other half with 
a red fluorophore; 
5 Fig. 3B shows the translation of the hybridization 

image of Fig. 3A into a karyotype of the yeast genome, 
based on the hybridization pattern of the Fig. 3A micro- 
array which contains yeast DNA sequences that have been 
previously physically mapped in the yeast genome; 

10 Figs. 4A and 4B show scans of hybridization signals 

from an array of genes probed with f luorescently-labeled 
Arabidopsis wild-type (4A) or transgenic HAT4 (4B) cDNA 
at low photomultiplier tube settings; 

Figs. 5A and 53 show scans of hybridization signals 

15 from an array of genes probed with f luorescently-labeled 
Arabidopsis wild-type root (5A) or wild-type leaf (58) 
cDNA at intermediate photomultiplier tube settings; 

Fig. 6 shows a combined two-color scan of a micro- 
array containing 1,046 cONA's from peripheral blood 

20 lymphocytes (PBL's) after hybridization with a mixture of 
cDNA's from bone marrow labeled with Cy5-dCTP and cDNA's 
from Jurkat cells labeled with f luorescein-dCTP, where 
red spots indicate greater gene expression levels in bone 
marrow, green spots, greater gene expression levels in 

25 Jurkat cells, and yellow spots, comparable gene expres- 
sion levels in both cells; 

Fig. 7 shows a combined two-color scan of a micro- 
array containing 1,046 cDNA's from peripheral blood 
lymphocytes (PBL's) after hybridization with a mixture of 

30 cDNA's from heat-shocked Jurkat cells labeled with Cy5- 
dCTP and cDNA's from control Jurkat cells labeled with 
fluorescein-dCTP, where red spots indicate greater gen 
expression levels in heat-shocked cells, green spots, 
greater gene expression levels in unshocked cells, and 



8 

yellow spots, comparable expression gene expression 
levels in both cells; 

Fig. 8 shows a combined two-color scan of a micro- 
array containing 1,046 cDNA's from peripheral blood 
lymphocytes (PBL's) after hybridization with a mixture of 
cDNA's from phorbol-ester-treated Jurkat cells labeled 
with Cy5-dCTP and cDNA's from control Jurkat cells 
labeled with f luorescein-dCTP, where red spots indicate 
greater gene expression levels in phorbol-ester treated 
cells, green spots, greater gene expression levels in 
untreated cells, and yellow spots, comparable expression 
gene expression levels in both cells; 

Figs. 9A and 9B are schematic displays of activated 
and repressed genes in Jurkat cells in response to heat 
shock (9A) and phorbol ester (9B) , where the colors 
indicated on the display correspond to array elements 
that display greater than 2-fold elevation (red) , less 
than a 2-fold change (black) , or less than 2-fold repres- 
sion (green) ; 

Figs. lOA-lOL are Northern RNA "dot" blots of 
samples of RNAs from control Jvirkat cells (-HS) or heat- 
shocked Jurkat cells (+HS) , after spotting onto nylon 
membranes, and blotting with the designated cDNA probes 
from Table 1; 

Figs. IIA-IIE are Northern RNA "dot" blots of 
samples of RNAs from control Jurkat cells (-PMA) or 
phorbol-ester-treated Jurkat cells (+PMA) , after spotting 
onto nylon membranes, and blotting with the designated 
cONA probes from Table 1; and 

Figs. 12A-12C show transcript profiles of heat 
shock- and phorbol ester-regulated genes, where gene 
expression levels per 10^ mRNAs (x-axes) are shown for 15 
genes (Table l) in hviman bone marrow, brain, prostate, 
and heart, and the genes are grouped according to expres- 
sion levels (A-C) . 



9 



D tailed Descrlptl n cf th Invention 

I . Definitions 

Unless indicated otherwise, the terms defined below 
have the following meanings: 
5 A "polynucleotide" refers to a DNA or RNA polymer at 

least about 50 bases in length, i.e., containing at least 
about 50 nucleotide subunits. 

"Different-sequence polynucleotides" refers to 
polynucleotides having different, unique base sequences. 

10 Such polynucleotides are distinguished by their ability 
to hybridize selectively to complementary-strand nucleic 
acid sequences under selected hybridization conditions. 

A "complex mixture of different-sequence polynucleo- 
tides" refers to a mixtiare containing a plurality, and 

15 preferably at least 100, different-sequence polynucleo- 
tides. The complexity of the mixtxire is defined by the 
aggregate unique base sequence in the different-sequence 
polynucleotides. For example, a mixture of 1,000 poly- 
nucleotides, each containing 1,000 unique base sequences, 

20 will have a complexity of l million bases. 

A "microarray of DNA sequences" refers to a spatial 
array of nucleic acid polymers, e.g., polymers of at 
least 15-20 bases, preferably polynucleotides of 50 bases 
or more, having a different DNA sequence at the different 

25 microarray locations. The different sequences may be of 
known sequence, or partially or completely sequenced, as 
with polymers prepared by solid-phase synthesis, primer- 
amplified genomic DNA fragments or expressed-sequence 
tags (EST clones) (Adams, et al., 1991, 1992), or they 

30 may be unsequenced, as with library cDNA's isolated from 
a library. The array may also contain regions with 
different graded concentrations or sequence lengths of 
same-sequence polynucleotides, and/or mixtures of two or 
more different sequences. The different DNA sequences 

35 may also include DNA analogs, such as analogs having 



10 



5 
10 

Ul 

y = 

S - = 

fU 

s 

[3 20 
25 
30 



phosphonate or phosphoramidate backbones, capable of 
hybridizing, through Watson-Crick base pairing, with 
complementary-sequence DNA or RNA. The microarray has a 
density of distinct gene sequences of at least about 



diameters, in the range between about 10-500 iS^^^trt^"^ '^l'^^ 



separated from other regions in the same array by about 
the same distance. 

"Cells of a given cell type or types" refers to 
cells obtained from one or more particular tissues or 
organs, e.g., hepatocytes, heart muscle cells, pancreatic 
cells, or non-differentiated embryonic tissue, or to a 
particular blood cell type or types, e.g., peripheral 
blood lymphocytes. 

Cells having a "selected physiological state or 
disease condition" or "test cells" refer to cells of a 
given cell type or types which, as examples, (i) are in 
a defined state of differentiation or activation, e.g., 
by gene activation; (ii) are infected by a defined 
infectious agent, e.g., HIV-infected T cells; (iii) are 
in a neoplastic state, i.e., tximor cells; (iv) are in a 
chemical- or physical-response state, i.e., after expo- 
sure to a phaxrmacological agent with respect to control 
cells of the same type or types; (v) are in a defined 
stage of cell cycle; (vi) have a particular chronological 
age; (vii) are undergoing response to an extracellular 
signal; (viii) have a genetic defect; (ix) are from a 
patient with a particular physiological disease state; or 
(x) are taJcen from particular anatomical locations or 
microenvironments . 

Cells which are in a normal or control or an alter- 
native r ference state with respect to "test cells" ar 
referred to herein as "control cells". 




10 



11 



"Reporter- labeled copies of messenger nucleic acid" 
refers to reporter-labeled mRNA transcripts obtained from 
test or control cells or cDNAs produced from such tran- 
scripts. The reporter label is any detectable reporter, 
and typically a fluorescent reporter. 

A "cDNA" refers to a cloned, amplified, synthesized, 
single-stranded, double-stranded, or first strand product 
DNA corresponding in sequence to all or a portion of an 
mRNA. 



II. DNA Sequence Mleroar'T-;^y« 

This section describes microarrays of different DNA 
sequences disposed at discrete locations on a non-porous 
surface, at a density of at least about 100 sequences /cm^. 

15 The DNA sequences in the array are each (i) present in 
multiple copies, and (ii) effective to hybridize to an 
individual polynucleotide in a mixture of polynucleo- 
tides, in accordance with the method of the invention. 

Methods of forming microarrays of this type have 
^ 20 been,d^iled in the above-cited, co-owned U.s.j^^£ts- 
A. ^^•jrc^ i o no Serial Nos. 08/514,875, 08/477,809, and 

08/261,388, and related PCT application WO/95/35505, 
published December 28, 1995, all of which are incorpor- 
ated by reference herein. Details are also given in the 

25 Methods below. 

Briefly, a capillary- or tweezer-like fluid dispen- 
ser in a robotic apparatus is employed to pick up a 
selected DNA solution from a solution source, and dis- 
tribute the solution to each of a plurality of non-porous 

30 microarray substrates, at a selected microarray location 
on the substrate surface, and in a selected amount 
(solution volume) . A dropl t of solution is deposited on 
the substrate surface by tapping the dispenser on the 
substrate at the selected deposition region. The volume 

35 of material deposited can be controlled according to the 



10 



12 

dim nsions of the dispenser, the tapping force, the 
viscosity of solution, and the duration of application, 
to achieve a selected deposition volume, as described in 
the above co-owned patent applications. Typically 
deposition volumes are between about 2 x lO"' to 2 
nanoliters (nl) , corresponding in droplet size (surface 
diameter) between about 20 to 200 /xm. The DNA in the 
each microarray region is preferably present in a defined 
amount between about 0.1 femtomoles and 100 nanomoles. 

This operation is repeated for each microarray 
position, until the array is completed, with a different 
DNA sequence at each position (recognizing that some 
microarray positions may be duplicates with the same or 
different amounts of a single sequence, and some micro- 
15 array positions may contain combinations of two or more 
different DNA sequences) . 

As noted above in the definitions above, the regions 
of the microarray are about 20-500 urn in diameter, 
preferably about 50-200 jum, with spacing between adjacent 
regions on the microarray surface of about the same 
dimensions. The microarray density is at least lOO/cm^, 
and preferably at least l,000/cm2. Thus, for example, an 
array having 100 /xm size DNA spots, each separated by 100 
Mm, will have a density of about 2,500 regions/cm^. 
25 The non-porous substrate surface on which the 

different DNA sequences are deposited may be any surface 
capable of supporting an aqueous layer on the surface 
without liquid absorption or flow into or through the 
substrate; that is, any substrate surface which is water- 
impermeable. Preferred substrates are glass slides, 
metalized surfaces, non-porous polymer substrates, such 
as polyethylene, polyurethane , or polystyrene sheet 
material, or substrates which are coated with a non- 
permeable film or layer. 



20 



30 



13 

in the case where the DNA sequences are to be 
covalently attached to the substrate surface, the sub- 
strate includes or is treatgd t^ include chemical groups, 
such as sxlylated glass ,;;^^farboxyl, amine, aldehyL 
or sulfhydryl groups. After deposition of the DNA 
sequences on the substrate surface, and either before or 
after drying the solution at each array location, the DNA 
IS fixed by covalent attachment to the surface. This may 
be done, for example, by drying the DNA spots on the 
array surface, and exposing the surface to a solution of 
a cross-linking agent, such as glutaraldehyde, boro- 
hydrxde, or any of a number of available bifunctional 
agents. One such linking method is detailed in the 
Methods section below. 

Alternatively, the DNA sequences may be attached to 
the substrate surface non-covalently, and typically by 
electrostatic interaction between positively charged 
surface groups and the negatively charged DNA molecules. 
In one preferred embodiment, the substrate is a glass 
slide having formed on its surface, a coating of a 
polycationic polymer, preferably a cationic polypeptide, 
such as polylysine or polyarginine, as described in the 
Methods section below. m experiments conducted in 
support Of the invention, as illustrated in Example i and 
2, It has been discovered that the non-covalently bound 
polynucleotides remain bound to the coated slide surface 
when an aqueous DNA sample is applied to the substrate 
under conditions which allow hybridization of reporter- 
labeled polynucleotides in the sample to complementary- 
sequence (single-stranded) polynucleotides in the sub- 
strate array. 

The DNA sequences forming the microarray are poly- 
nucleotides having lengths of about 50 bases or greater, 
although shorter DNA polymers, e.g., in the range 15-50 
bases may also be employed. These DNA sequences can be 



14 



15 



formed by a variety of methods, including solid-phase 
synthesis, polymerase chain reaction (PGR) methods, gene 
cloning, gene cloning in combination with PGR, and solid- 
phase methods. The DNA sequences on the array have the 
0^ 5 same sequence as some, and preferably a largeL^wSfe of 
the polynucleotides in the complex mixture which will be 
reacted with the micr ©array. That is, the microarray 
sequences will hybridize with selected polynucleotides in 
the polynucleotide mixture. Moreover, the hybridization 
10 is selective, meaning that one array sequence will 
typically hybridize with one species only in the poly- 
nucleotide mixture. This requirement would exclude 
relatively short, e.g., less than 8-10 base oligonucleo- 
tides as array DNA sequences and/or combinatorial oligo- 
nucleotides such as are used for sequencing by hybridiza- 
tion. 

Preferred sources of DNA sequences in the micro- 
arrays are genomic sequences and mRNA-derived sequences, 
as illustrated in the examples below. DNA sequences from 
20 genomic sources are typically obtained from cloned 
genomic fragments corresponding predominantly to single- 
copy or low-copy ntimber genes, i.e., excluding repeat- 
sequence genomic fragments, or by primer-amplification of 
transcribed regions of a genome. 
25 The cloned fragments may be excised and/ or PCR- 

amplified, then pxirified by conventional methods, such as 
described in the Methods below. Alternatively the 
fragments may be obtained directly from genomic sources, 
e.g., from genomic fragments that have been treated to 
30 remove repeat sequences, and/or by PGR amplification 
methods. For example, using computer-aided sequence 
analysis, it is possible to construct PGR primers for 
expressed genomic sequences, to selectively generate 
these sequences. The purified genomic sequences applied 



15 

to the microarray surface may be partially or completely 
sequenced, or may have unknown sequences. 

DNA sequences derived from mRNA's can include mRNA's 
themselves, but more commonly sequences produced by 
5 reverse transcription of mRNA's, or DNA sequences ob- 
tained from a library of cloned cDNA's, e.g., by excision 
or primer amplification. The mRNA's are preferably 
obtained from cells having a selected physiological stat 
or disease condition and from corresponding test cells, 
10 as defined above, according to known methods. The mRNA- 
derived sequences applied to the microarray surface may 
be completely sequenced, partially sequenced, e.g., as 
for expressed sequence tags (EST's) , or may have unknown 
sequences . 

^5 1 shows a portion of a microarray 20 having a 

sxibstrate 22 whose surface 24 contains an array of 
regions, such as at 26, containing different-sequence 
DNA's. A 20 X 20 array in a (4mm) ' area, and thus having 
a density of about 2,500 regions/cm* is shown in Fig. 2. 

20 The array is formed as described in Example l. 

III. Microarray Method 

The method of the invention is designed to measure 

the relative abundances of polynucleotides in a complex 
25 mixture of polynucleotides. As defined above, a complex 

mixture preferably contains at least about 50, and up to 

1,000 or more polynucleotides with different sequences. 

The complexity of the mixture is defined by the number of 

unique polynucleotide sequences and is preferably at 
30 least 10*, e.g., containing 1000 different-sequence 

polynucleotides having an average of at least 1,000 

bases . 

Genomic fragments of the type described in Section 
II above represent one source of polynucleotides, where 
35 the method is used to determine the relative abundance of 



16 

different genomic species, .g., from a selected chrom - 
some or region of a chromosome, or chrom somes of a given 
cell type. Methods of obtaining genomic fragments 
suitable for the method are described above, or are 
known, e.g., as described in U.S. Patent No. 5,376,526. 

Polynucleotides derived from mRNA's as above repre- 
sent another source of polynucleotides, where the method 
is used to determine the relative abundance of mRNA's 
from a given cell source, for pxirposes of measuring the 
relative levels of gene expression of a plurality of 
genes, e.g., from the cell source. Methods for obtaining 
mRNA-derived polynucleotides, e.g., corresponding to all 
of the expressed genes in a given cell type, are well 
known. 

In practicing the method of the invention, the 
polynucleotides in the mixture are labeled with a fluor- 
escent reporter. The fluorescent labeling can be carried 
out by standard methods, typically by transcribing DNA in 
the presence of fluorescence labeled nucleotides, such as 
Cy5-dNTP or fluorescent-labeled dNTP. Labeling may be 
carried out in vitro or in vivo. 
- In one embodiment of the invention, described below, 
of polynucleotides from two different cell 
sources, referred to herein as test and control cells, 
are labeled with different fluorescent reporters which 
have independently detectable fluorescent properties, 
typically different fluorescent-emission peaks. The two 
independently labeled mixtures are then combined. 

The mixture of labeled polynucleotides is contacted 
with the microarray surface, under conditions which 
permit hybridization between microarray sequences and 
complementary sequences in the mixture. The amount of 
polynucleotide mixture added to the microarray should be 
less than that which would saturate most of the array 
regions. In a preferred embodiment, a mixture of poly- 



17 



nucleotides having a solution concentration of between 
about 0.1 to 10 fig DNA/^l is placed on the microarray 
surface, in a total volume of between about 0.5-10 Ml/cm' 
microarray surface, and the surface is covered with a 
5 glass coverslip. in this configuration, the "thickness" 
of the polynucleotide mixture is preferably between 10 
and 500 nia. 

When the polynucleotide mixture is doxoble-stranded 
DNA, the double-stranded species may be denatured before 

10 application to the microarray surface, e.g., by heating 
above the denaturation temperature, then rapidly cooling, 
as described in Example 1 below. Alternatively, the 
double-stranded material may be denatured by heating 
after addition to the slide, then cooled to permit 

15 hybridization. 

Standard hybridization conditions for hybridizing 
labeled polynucleotides to the complementary-sequence DNA 
sequences on the microarray are employed, preferably to 
an end point at which complete hybridization is achieved, 

20 as detailed in the Methods below. The salt and tempera- 
ture conditions of hybridization are selected for a 
desired stringency, according to well known principles, 
and/ or the array is washed after hybridization at high 
stringency salt and temperature conditions. 

25 After hybridization, the microarrays are washed, 

then scanned for fluorescence intensity. The scanning 
may be carried out with a confocal laser scanning device, 
as detailed in the Methodj^ -^Jjf scanned 
^ fluorescence data, an image^-o^^microarray can be recon- 

30 structed, typically employing software to convert image 
intensity, which may vary over 3-4 logs, to a color or 
gray-scale intensity. 

Wher the array being scanned contains two fluores- 
cent labels, each label is scann d independ ntly at a 

35 selected excitation wavelength. After correcting for 




18 



10 



optical crosstalk between the f luorophores, due to their 
overlapping emission spectra, the combined pattern may be 
represented as two separate images, in which the inten- 
sity of each label at each array position is represented, 
for example, in terms or a color or gray scale. Figs. 4A 
and 4B, and 5A and 5B, discussed below, are illustrative. 

Alternatively, in the case of a two-color image, 
each label may be represented with a different color, 
e.g., red and green, at a color intensity corresponding 
to the measured intensity of that fluorophore. This 
representation has the advantage of allowing quick visual 
assessment of microarray regions where one or the other 
fluorophore dominates in polynucleotide abundance (as 
evidenced, for example, by predominantly red or green 
15 spots) , or where equal abundances of the two species are 
present (as evidenced, for example, by yellow spots) . 
This type of representation is seen in Figs. 6-8, dis- 
cussed below. 

g\^r4e^ ^r>°^^ below, the method allows for detection 

20 -4!»-=ehaiafSs of- relative abundance between different 
polynucleotides on the array, and between the same 
polynucleotide from two different sources, of 20-50% or 
less. Thus, where the method is used to determine 
relative levels of gene expression in a plurality of 

25 genes from two different cell sources, very small changes 
in expression, e.g., 20%, can be detected. As will be 
illustrated below, this sensitivity allows large numbers 
of genes, e.g., all genes whose mRNA is present at a 
level or at least l:io* and l:lo', to be simultaneously 

30 monitored for both major and slight variations in gene 
expression, in response to a given change in cell state 
or cell type. This capability, in turn, allows for the 
simple identification of many new genes that are up- 
regulated or down-regulated in response to a particular 

35 shift in cell state or type. 



• 



19 

IV. Applications 

The method described above has a variety of applica- 
tions, including genetic and physical mapping of genomes, 
genetic diagnosis, genotyping of organisms, monitoring of 
5 gene expression, and gene discovery. 

A. Genetic Analysis 

For genetic analysis, a plurality of polynucleo- 
tides, e.g., cloned genomic fragments, is hybridized to 
10 an ordered array of DNA fragments—typically cloned 
genomic fragments — and the identity of the DNA elements 
applied to the array is unambiguously established by the 
pixel or pattern of pixels of the array that are de- 
tected. 

15 One application of such arrays for creating a 

genetic map is described by Nelson, et al. (1993). In 
constructing physical maps of the genome, arrays of 
immobilized cloned DNA fragments are hybridized with 
other cloned DNA fragments to establish whether the 
cloned fragments in the probe mixture overlap and are 
therefore contiguous to the immobilized clones on the 
array. For example, Lehrach, et al. (1990), describe 
such a process. 

Example 1 illustrates an application of the method, 
25 for studying genomic complexity in S Ji te ". Here 

genomic fragments from the six largest yeast chromosomes 
were labeled with one fluorescent tag, and fragments from 
^^lOj^a^lest chromosomes, with another tag, and the 
^ -iabolcd two \f ragment mixtures were hybridized with a 

30 microarray of DNA sequences representing cloned yeast 
genomic fragments. The results are shown in Fig. 3A. A 
red signal in the figure indicates that the lambda clone 
on the array surface contains a cloned genomic DNA 
segment from one of the largest six yeast chromosomes. 
35 A green signal indicates that the lambda clone insert 



20 



10 



20 

comes from one of the smallest ten yeast chromosomes. 
Orange signals indicate repetitive sequences which cross 
hybridized to both chromosome pools. Control spots on 
the array confirm that the hybridization is specific and 
reproducible. 

The physical map locations of the genomic DNA 
fragments contained in each of the clones used as array 
elements have been previously determined by Olson and co- 
workers (Riles, et al., 1993), allowing for the automatic 
generation of the color karyotype shown in Figure 3B. 

The color of a chromosomal section on the karyotype 
corresponds to the color of the array element containing 
the clone from that section. The black regions of the 
karyotype represent false negative dark spots on the 
15 array (10%) or regions of the genome not covered by the 
Olson clone library (90%). The largest six chromosomes 
are mainly red while the smallest ten chromosomes are 
mainly green, matching the original CHEF gel isolation of 
the hybridization probe. Areas of the red chromosomes 
containing green spots and vice-versa are probably due to 
spurious sample tracking errors in the formation of the 
original library and in the amplification and spotting 
procedures . 

It can be appreciated how this approach can be 
25 applied to other types of genetic analysis, for example, 
in comparative genomic hybridization, or for use deter- 
mining the extent of genetic divergence between two 
different species. 

In all of these applications, the microarray method 
30 allows a large number of genomic sequences, e.g., 10^-10* 
Q?^?V ^° examined on a single array, using only 
^L-^ -aasB^ amount of genetic material, and with the capability 

of detecting changes or differences in the relative 
abundance between labeled polynucleotides on the array of 
35 as little as 20%. 



20 



21 



10 

ui 15 

•IT - 

25 
30 

^ 35 



B. Gene Expression Analysis 

In another general application, a microarray of cDNA 
clones representing genes from a given source, or the 
expressed genomic sequences from that source, is hybri- 
dized with labeled cDNA's to monitor gene expression for 
research or diagnostic purposes. In a one-color mode, 
the method is used to exsuaine relative expression levels 
among the various genes represented on the microarray. 
Alternatively, to monitor relative levels of gene expres- 
sion between the individual genes from test and control 
cells, each labeled cDNA mixture can be hybridized with 
an individual microarray, and the patterns of fluores- 
cence intensities in the two microarrays compared. 

Relative levels of gene expression from two sources 
can be more advantageously studied, in accordance with 
the method, in single-array mode where the cDNA mixtures 
from two different cell sources are labeled with dif- 
ferent fluorescent labels, and hybridized simultaneously 
with a single microarray. The ratio of the two fluores- 
cent labels at each array position is a measure of the 
differential expression of that gene in the two cell 
sources. The ratio measurement is independent of the 
absolute abundance level of that gene. 

It is also possible to use more than two fluorescent 
tags, when examining more than two cell sources simul- 
taneously. This approach greatly enhances the resolution 
of the method, i.e., the ability to detect small changes 
in expression level, by effectively eliminating varia- 
tions between two separate microarrays in fluorescent 
signals due to, for example, different amounts of target 
sequences on the arrays. 

The use of the two-color method for examining gene 
expression levels in plants in illustrated in Example 2. 



atabiyep&±s- plants containing an exogenous HAT-4 gene 





22 

were isolated, and cDNA from the two sources was prepared 
and labeled with different fluor scent labels. The cDNA 
mixture was ^^^^^J^^^^^ ^ single microarray containing 
an array of^ ajv& j. c?o ^jS£L>?cloned cDNA's. Figs. 4 A and 4B 
5 show the f lu^escent-intensity scans of the microarray 
CL for the Jfe i^tS tf a - cDNA fluorescence label (4A) and the 

transgenic cDNA fluorescence label (4B) (both from the 
same microarray) . The gene expression patterns differ in 
several respects, but in particular, by the presence in 

10 two strong spots in Fig. 4B (indicated at 49.50) cor- 
responding to HAT-4 cDNA. 

Using an identical same microarray, a similar study 
was carried out to examine differences in the gene 
expression patterns from root and leaf tissue from 

15 ajyibide ^>s±&, shown in Figs. 5A and 53, respectively. As 
seen, the two tissues give quite different patterns of 
expression for the particular cDNA sequences on the 
array. Details are given in Example 2. 

One of the important applications of the method is 

20 the ability to identify genes that are associated with a 
given cell state or type. The objective of such studies 
is to identify new genes that can be used as diagnostic 
indicators of the cell state or type, and to identify new 
cellular enzymes and pathways, related to expression of 

25 the newly discovered genes, that can serve as targets in 
developing new therapeutic agents. 

Heretofore, detecting levels of differential gene 
expression between control and test cells has been 
limited by the number of genes that can be examined on a 

30 single array, and the sensitivity of array methods to 
changes in the levels of hybridized probe. In general, 
these limitations significantly increas(^ the work r - 
^ quired to identify low-copy-number genes^aiT'inter est , and 
may even prevent such genes from being identified. 



i 



23 

To illustrate the method as applied to identifying 
new genes in human bone-stem cells, a microarray of human 
cDNA clones, picked at random from a human peripheral 
blood lymphocyte cDNA library, was prepared as detailed 
in Example 3. The array . contained 1,046 PBL clones, 
and 10 sequences f rom^^aii^ht^d^isis , as control sequences. 

In a first study, a mixture of cDNA's from bone 
marrow labeled with Cy5-dCTP and cDNA's from Jurkat cells 
labeled with f luorescein-dCTP were hybridized to the 
microarray, to assess the different patterns of gene 
expression in the two blood-cell types. The scanned 
array is shown in Fig. 6, where red spots indicate higher 
gene expression in bone marrow, green spots, greater gene 
expression levels in Jurkat cells, and yellow spots, 
comparable gene expression levels in both cells. It is 
apparent from the figure that the two cell types have 
quite different levels of gene expression in many genes. 

A second study was designed to identify genes 
responsive to heat shock in the Jurkat cell line. A 
number of heat-shock proteins in Jurkat and other cell 
lines have been identified, as discussed below. It was 
therefore of interest to see if the present method could 
extend the nximber of genes known to be responsive. 

In the study, Jurkat cells, after initial culturing, 
were grown for 4 hours at 37"»c and 43«'C, respectively. 
Total mRNA from the harvested control and heat-shocked 
cells was labeled by reverse transcriptase incorporation 
of fluorescein- or Cy5-derivatized dCTP, respectively, as 
detailed in Example 3. The two cDNA fractions were mixed 
and hybridized to the human PBL microarray above. The 
combined fluorescent scans for the two fluorophores are 
shown in Fig. 7. 

Examination of the fluorescent scans revealed 
positive hybridization signals to >95% of the human cDNA 
array elements, but not to any of the Arabidopsis con- 



• 



24 



trols. Hybridization intensities spanned more than three 
orders of magnitude for the i,046 array elements sur- 
veyed, comparative expression analysis of heat shocked 
versus control cells in the two experiments revealed 
5 altered fluorescence intensities at 17 array elements. 
Of the 17 putative differentially expressed genes, ll 
were induced by heat shock treatment and 6 displayed 
modest repression. This result is indicated schemati- 
cally in Fig. 9A which shows up-regulated and down- 

10 regulated genes as red and green spots, respectively. 

To determine the identity of the genes that ex- 
hibited altered expression patterns, cDNAs corresponding 
to each of the 17 array elements were subjected to single 
pass DNA sequencing on the proximal end of each clone 

15 (see Example 3 for details). Database searches of the 
sequences revealed "hits" for 14 of the 17 clones (Table 
1, B1-B17) ; the three remaining clones (B8-B10) did not 
match any sequence in the public human database, though 
one of the clones (B7) exhibited significant homology to 

20 an expressed sequence tag (EST) from C. elegans. To 
further confirm ^JJ^ d^lvSf Si clones, the nucleo- 
^ tide sequence of ^ a^a -l- en d- each cDMA was determined. In 
all cases, proximal and distal cDNA sequences mapped to 
the seuae gene. 



25 



25 
Tabla ^ 



Clone 


Row 


Column 


Ratio 


Blast Identity 


Accession # 


Bl 


2421 


0.5 




CYC oxidase III 


J01415. J01415 


B2 


1 


31 


0.5 


/3-Actin 


N.R, . X00351 


33 


15 


8 


0.5 


CYC oxidase III 


J01415. J01415 


34 


32 


19 


0.5 


CYC oxidase III 


J01415. J01415 


35 


17 


8 


0.5 


CYC oxidase III 


J01415. J0141S 


36 


22 


31 


0.5 


jS-Actin 


X.R, . X00351 


37 


5 


4 


2.0 


Novel" 


U56653 - U56fi54 


38 


2 


19 


2.0 


Novel' 


U56655. U56656 


39 


14 


5 


2.2 


Novel' 


U56657. U5665a 


310 


7 


8 


2.4 


Polyubiquitin 


X04803 . X048O'3 


311 


12 


2 


2.4 


TCP-l 


X52882. X528a2 


312 


28 


2 


2.5 


Polyxibiquitin 


M17597, M17597 


313 


14 


7 


2.5 


Polyubiquitin 


X04803, X04803 


314 


20 


9 


2.6 


HSP90"' 


M16660, M16660 


315 


30 


12 


4.0 


DnaJ homo log 


D13388, D13388 


316 


10 


5 


5.8 


HSP90a 


X07270, X07270 


317 


13 


16 


6.3 


HSP90a 


M27024, X15183 


318 


7 


19 


2.0 


* - 2 -microglobu 1 in 


S54761, M30683 


319 


21 


30 


2.1 


Novel* 


U56659, U56660 


320 


3 


26 


2.2 


"-2-microglobulin 


S54761, M30683 


321 


1 


18 


2.6 


PPG kinase 


M11968, L00160 


322 


22 


30 


3.5 


NF-kBl 


Z47744, M55643 


323 


20 


16 


19 


PAC-1 


L11329, L11329 



The five most highly induced genes in heat-treat d 
cells included heat shock protein 90a (HSP90o) , DnaJ, 
HSP90^, polyubiquitin, and t-complex polypeptide-1 
(TCP-l) (Table 1) . HSPa, DnaJ, and HSP90i8 exhibited a 



t 



26 



6.3-, 4.0-, and 2. 6-fold induction, respectively; lesser 
activation was observed for genes encoding polyubiquitin 
and TCP-1 (Table 1) . Three novel sequences (B7-B9) each 
exhibited induction in the 2 -fold range (Table l) . A 
5 modest repression was observed for both jS-actin and 
cytochrome c (eye) oxidase III (Table l) . m several 
cases, clones corresponding to the same gene were re- 
covered at multiple locations on the array; expression 
ratios for these clones varied by less than 10% from 

10 element to element (Table 1) . 

To confirm that the changes in expression determined 
with the microarray assay corresponded to altered mR^IA 
levels, each, of the cloned sequences^w^^^^^d^^^ 
in RNA blotting analyses , as described in Example 3 . All 

15 of the genes ^that displayed heat shock induction by 
^ microarray^iafifa il| oSo:» yielded similar results in "dot 
blot" experiments (Figs. lOA-lOL and (Table 2, B1-B17) . 
The gene encoding HSP90o, for example, exhibited 6.3-fold 
activation by microarray analysis and 7 . 2-fold induction 

20 by RNA blotting (Fig. lOI; Table 2). In all cases, 
expression ratios as determined by the two procedures 
differed by less than 2-fold for the genes identified in 
the heat shock experiments (Table 2). The two assays 
differed more widely in terms of assessing absolute 

25 expression levels? nonetheless, absolute expression as 
monitored on a microarray typically correlated with RNA 
blots to within a factor of five (Table 2) . 



27 
Table 2 



Clone 

Bl 
B2 



Blast Identity 

CYC oxidase III 
jS-Actin 



Expression Level (per los) 



Microarray 

92/46 
240/120 



Ratio 

0.5 
0.4 



RNA Blo t 

100/80 
270/280 



B3 



CYC oxidase III 



36/18 



0.5 



N.D. 



B4 
B5 
B6 
B7 

B8 



CYC oxidase III 

CYC oxidase III 

°-Actin 

Novel (weakly 
to D7602 6) 

Novel 



76/38 
62/31 
180/89 
1.3/2.6 

2.0/4.0 



0.5 
0.5 
0.5 
2.0 

2.0 



N.D 
N.D. 
N.D. 
0.77/1.8 

1.5/3.4 



B9 



Novel 



0.8/1.8 



2.2 



1.2/1.8 



BIO 



Polyubiquitin 



0.8/72 



2.4 



25/89 



Bll 



TCP-1 



2.3/77 



2.4 



7.1/27 



B12 



Polyubiquitin 



0.8/2.0 



2.5 



N.D. 



B13 



Polyubiquitin 



1.7/4.3 



2.5 



N.D. 



B14 



HSP90* 



75/200 



2.6 



30/120 



B15 
B16 
B17 



DnaJ homo log 
HSP90a 
HSP90a 



1.0/4.0 
0.6/3.5 
0.8/5.0 



4.0 
5.8 
6.3 



1.6/13 
3.2/29 
8.6/6.2 



Genes that exhibited positive regulation in 
heat-treated T cells encode factors that either function 
as molecular "chaperones" of protein folding (HSP90a, 
HSP90^, DnaJ, TCP-i), or as mediators of selective 
protein degradation (polyubiquitin) . The identification 
of these sequences is consistent the biochemical basis of 
heat shock induction. Many proteins undergo denaturation 
at elevated temperatures, and those that fail to maintain 



28 

proper conformation xaust be selectively degraded (Jindal, 
1996; Wilkinson, 1995; Jakob, et al., 1994; Becker and 
Craig, 1994; Cyr, 1994; Craig, et al., 1994). It will be 
interesting to determine whether the three novel heat 
5 shock-inducible sequences (B7-B9) identified in this 
study play a role in protein folding and turnover, or 
possess some other biochemical activity. Complete 
nucleotide sequence determination, conceptual transla- 
tion, expression monitoring, and biochemical analysis 

10 should provide a clue to the function of these genes. 

In svimmary, the method was successful in confirming 
a number of known heat-shock genes and identifying three 
new genes not previously associated with heat shock. 

A third study was designed to identify genes whose 

15 level of expression is related to treatment with phorbol 

^ ester. Phorbol ester, a potent^ a c ts ivato a^- the protein 

kinase C family (Mewton, 1995; Nishizuka, 1995), acti- 
vates a set of genes distinct from those involved in the 
heat shock pathway. 

20 Control and phorbol-ester treated Jurkat cells were 

cultured as described in Example 4. Total cDNA from 
U-"^' control and drug-treated cells yoro - labeled by reverse 
transcriptase incorporation of fluorescein- or 
Cy5-derivatized dCTP, respectively, as detailed in 

25 Example 4. The two cDNA fractions were mixed and hy- 
bridized to the human PBL microarray above. The combined 
fluorescent scans for the two fluorophores are shown in 
Fig. 8. 

As above, examination of the fluorescent scans 
30 revealed positive hybridization signals to >95% of the 
humaij cDNA array elements, but not to any of the 
Sfea^tiSQ^^is^ controls. Hybridization intensities spann d 
more than three orders of magnitude for the 1,046 array 
elements surveyed. Comparative expression analysis of 
35 heat shocked versus control cells in the two experiments 



29 



revealed altered fluorescence intensities at 6 array 
elements (Table 1 above, B18-B23) , all of which showed 
modest to strong up-regulation in response to phorbol- 
ester treatment. The positions of the genes in the array 
are illustrated in Fig. 9B (red spots) . 

TO determine the identity of the genes that ex- 
hibited altered expression patterns, cDNAs corresponding 
to each of the 6 array elements identified above were 
subjected to single pass DNA sequencing on the proximal 
end of each clone, as above. Database searches of the 
sequences revealed "hits" for 5 of the 6 clones (Table i, 
B18-B23), including the two most highly induced genes 
(Table 1) which corresponded to a tyrosine phosphatase 
(PAC-l) (Rohan, et al., 1993) and nuclear factor-kappa Bl 
(NF-kBl) (Thanos, et al,, 1995; Baeuerle, et al., 1994; 
Liou, et al., 1993) . Modest activation was also observed 
for genes encoding phosphoglycerate kinase (PGK) , ^-2- 
microglobulin, and one additional sequence (319) that did 
not match any entry in the public database (Table l) . 

Each of the phorbol ester-inducible genes identified 
by microarray analysis displayed increases in 
steady-state mRNA levels (Figs. IIA-IIE; Table 2, B18-23) 
in RNA blotting experiments. 

It is striking that, despite the previous intensive 
analyses of both the heat shock and phorbol ester path- 
ways, 4 of the 15 sequences identified in the two studies 
above represent novel human genes. The fact that the 
four novel genes share the common features of relatively 
low expression (about 1:50,000) and modest activation 
(about 2.2-fold) ,^3^§fist- that these sequences may have 
been simply overlooked in screens utilizing prior-art 
differential techniques. 

One way to examine the function of newly discovered 
genes is to determine their expression profiles in 
different tissues. To explore this, probes were prepared 



• 



30 



fr n hUBan bone marrow, brain, prostate and heart by 
labeling mRNA with reverse transcriptase in the presence 
Of Cy-s-dCTP. in a separate reaction, a control probe 
was prepared by labeling total Jurkat »rna with 
5 fluores=e.n-dCTP. The four cy-5-labeled tissue sa-ples 
were each mixed with an aliquot of the f luorescein- 
labeled Jurkat probe, and the two-color probe Mixtures 
were hybridized to four separate microarrays. The four 
arrays were then washed and scanned for fluorescence 
10 emission. Hybridization signals for each of the'i^^ 
samples were normalized internally to the Jurkat" control 
and an expression profile was generated for each of the 
1,046 genes represented on the array. 

Detectable expression was observed for all 15 of the 
15 heat Shock- and phorbol ester-regulated genes in the four 
tissue types examined, in general, the expression level 
Of each gene in Jurkat cells correlated rather closely 
with expression in the four tissues. Genes encoding ff- 
actin and cytochrome c oxidase, the two most highly 
20 expressed of the 15 genes in Jurkat cells (Table 2) , were 
also highly expressed in bone marrow, brain, prostate, 
and heart (Pig. 12^,,. similarly, genes expressed at 
moderate or low levels in Jurkat cells (Table 2) dis- 
played moderate to weak expression in the four tissue 
25 types (Pigs. 12B, 12C). one the novel heat shock genes 
(B7) showed an expression profile similar to the mito- 
chondrial gene encoding CYC oxidase III. 

Prom the foregoing, it can be appreciated how 
various objects and features of the method of the inven- 
30 tion are met. The microarrays used in the experiment can 
be designed for probing a wide range of genes, e.g., from 
particular species, cell types, or cell states, and an 

10-10 to be examined on a single array surface. This 
35 not only reduces the amount of work needed to screen 



large numbers of genes, but because a single array 
provides its own internal controls, allows higher resolu- 
tion in determining relative levels of abundance of 
polynucleotide probe species on the array. 

In the two-color, single-array mode, the method 
offers the additional advantage of an internal control in 
the levels of fluorescent signals associated with cDNA's 
obtained from two different sources. The greater resolu- 
tion this features provides allows detection of differ- 
ential gene expression, either up-regulation or down- 
regulation, of as low as 20%. As a result, it is pos- 
sible not only to scan very large collections of gene 
transcripts on a single array, but also to detect rela- 
tively minor changes in gene expression in each array 
sequence. Another advantage of th&t^^,^^-^^ scheme is 
that even if the number of binding sites of the arrayed 
target DNA elements becomes limiting, the competitive 
hybrMization of the twI^^^^^Wate the sites in a 
ratio reflecting their relative abundance. . , i v \ 
The studies involving^lg^^^ ' 



or phorbol-ester treatment illustrate the ability of the 
method to rapidly identify new genes associated with such 
cell states. 

The following examples illustrate, but in no way are 
intended to limit, the present invention. 

V. General Mai-h^^^e 

A- Microarrav Prenay^^^^^^ 

Target messenger nucleic acid DNA fragments were 
prepared as described in the examples. In one general 
embodiment (DMA sequences non-covalently linked to 
microarray surface) , the microarrays were fabricated on 
microscope slides which were coated with a layer of poly- 
1-lysine (Sigma) . For each microarray region, an auto- 
mated apparatus loaded l ^i of the concentrated target 



32 



DNA in 3 X SSC directly from 96 well storage plates into 
the open capillary printing element and deposited about 
5 nl of sample per slide at desired spacing between spots 
(Shalon, 1995), as described also in above-cited 
5 WO95/35505. 

After the spotting operation was complete, the 
slides were rehydrated in a humid chamber for 2 hours, 
baked in a dry 80* vacuum oven for 2 hours, rinsed to 
remove un-absorbed DNA and then treated with succinic 
10 anhydride to reduce non-specific adsorption of the 
labeled hybridization probe to the poly-l-lysine coated 
glass surface. Immediately prior to use, the immobilized 
DNA on the array was denatured in distilled water at 90» 
for 2 minutes. 

15 In another general embodiment (DNA sequences co- 

valently linked to the microarray surface), target 
sequences containing a reactive primary amine group on 
their 5' end were arrayed on a l.o cm^ glass surface of 
a silylated microscope slides (CEL Associates) , using the 

20 high-speed robotic printing method described above. 

The target DNA sequences were linked covalently to 
the glass surface and heat denatured to allow hybridiza- 
tion. Briefly, the printed arrays were incubated for 4 
hours in a humid chamber to allow rehydration of the 

25 array elements, rinsed once iir o^2% sodium dodecyl 
^ sulfate (SDS) for ijl^^wice inSl^for^l min, and once 
in sodium borohydride solution (l.o g.V^i^ dissolved in 
300 ml phosphate buffered saline (PBS) and lOO ml 100% 
ethanol) . The arrays were submerged in^-SlS-for 2^iS^ 
a^O 95«C, tra^nsjerred quickly into 0.2% SDS for ly^xins^d 
twice in^ie©-, air dried, and stored in the dark at 25'C. 



33 



nj 



^' Preparation o£ Report er-r. abe led Messenger 

Nucleic Add — 

Total RNA was isolated from a selected cell or 
tissue source using standard methods (Sambrook, et al., 
5 1988). PolyA+ mRNA was prepared from total RNA using 
"OLIGOTEX-DT" resin (Qiagen) . Reverse transcription 
reactions were carried out using a "STRATASCRIPT" RT-PCR 
kit (Stratagene) modified as follows: 50 ^1 reactions 
contained 0.1 ng/nl mRNA, o.i ng/;*l human acetylcholine 

10 receptor mRNA, 0.05 ng/fil oligo-dT (21mer) , ix first 
strand buffer, 0.03 units/^1 RNase block, 500 dATP, 
500 MM dGTP, 500 dTTP, 40 dCTP, 40 fluorescein- 
12-dCTP (or lissamine-5-dCTP) and 0.03 units//*l 
"STRATASCRIPT" reverse transcriptase. Reactions were 

15 incubated for 60,«^^ 37-c, precipitated with ethanol, 
and resuspended in 10 /*! TE. 

The samples were then heated for 3lS^sh^t 94 •€ and 
chilled on ice. RNA was degraded by adding 0.25 ^1 ION 
NaOH foldowed by a 10 min incubation at 37»C. The 

20 samples were neutralized by adding 2.5 fil IM Tris-HCl (pH 
8.0) and 0.25 ^1 ION HCl, and precipitated with ethanol. 
Pellets were washed with 70% ethanol, dried to completion 
in a "SPEEDVAC" (Savant, Farmingdale, NY) resuspended in 
10 111 HjO, and reduced to 3.0 /xl in a.^^^^. Fluores- 
(X^25 cent nucleotide analogs were ^jpur^a^ from DuPont NEN 
(Boston, MA) . 

Hybridization at Reporter -Labeled Nucleic Acid 
to Target mTT 

30 Hybridization reactions contained 1.0 nl of fluores- 

cent cDNA synthesis product (-2 /tg) and 1.0 fil of hvbrid- 
CLy buffer (10 x SSC, 0.2% sodium dodecyl 1^^^ 

-^mf. Th 2.0 nl probe mixtures were aliquoted onto the 
microarray surface and covered with 12 mm round c v r 
35 slips. Arrays were transferred to a waterproof slide 



34 

chamber having a cavity just slightly larger than a 
microscope slide. The chamber was kept at ioo% humidity 
internally by the addition of 2 microliters of water in 
a corner of the chamber, ^he chamber containing the 
^ 5 arrays was incubated for 18 *w^t 65 «C. 

The arrays were washed for S^i^tbovi temperature 
{25«C) in low^^stringency wash buffer (l x SSC, o.l% SDS) , 
^ then for loJaS^-at room temperature in high stringency 
wash buffer (0.1 x ssc, 0.1% SDS). 

10 

D- Detection of Hvbrirti zed Semienng>« 
The microscope used to detect the reporter-labeled 
hybridization complexes was outfitted with an Innova 70 
mixed gas 10 W laser (Coherent Lasers, Santa Clara, CA) 
15 capable of generating a number of spectral lines, inclu- 
ding lines at 488 nm and 568 nm, and 632 nm (for Cy5) . 
The excitation laser light was focused on the array using 
a 2 OX microscope objective (Nikon) . 

The- slide containing the array was placed on a 
computer-controlled X-Y stage on the microscope and 
raster-scanned past the objective. The 1.8 cm x i.s cm 
array used in the present example was scanned with a 
resolution of 20 im. Spatial resolutions up to a few 
micrometers are possible with appropriate optics. 
25 In two separate scans, a mixed gas multiline laser 

excited the two fluorophores sequentially. Emitted light 
was split, based on wavelength, into two photomultiplier 
tube detectors (PMT R1477, Hamamatsu Photonics, San Jose, 
CA) corresponding to the two fluorophores. Appropriate 
filters positioned between the array and the photo- 
multiplier tubes were used to filter the signals. The 
emission maxima of the fluorophores used were 517 nm 
(fluorescein), 588 nm (lissamine), and 650 for Cy5. Each 
array was typically scanned twice — one scan per f luoro- 
35 phore, using the appropriate filters at the laser source 



20 



30 



35 

— although the apparatus was capable of recording the 
spectra from both fluorophores simultaneously. 

The sensitivity of the scans was typically cali- 
brated using the signal intensity generated by an mRNA or 
5 cDNA control species added to the hybridization mix at a 
known concentration. For example, in the experiments 
described in Example 3 , human acetylcholine receptor mRNA 
was added to the wild-type Arabidopsis poly-A total mRNA 
sample at a weight ratio of 1:10,000. A specific loca- 
10 tion on the array contained a complementary DNA sequence, 
allowing the intensity of the signal at that location to 
be correlated with a weight ratio of hybridizing species 
of 1:10,000. 

When messenger nucleic^ Aoido derived - probes con- 
15 taining two different fluorophores (e.g., representing 
test and control cells) are hybridized to a single array 
for the purpose of identifying genes that are differ- 
entially expressed, a similar calibration scheme may be 
employed" to normalize the sensitivity of the photo- 
multiplier tubes such that genes expressed at the same 
levels in the test and control samples display the same 
pseudocolor intensity. In one embodiment, this calibra- 
tion is done by labeling samples of the calibrating cDNA 
with the two fluorophores and adding identical amounts of 
25 each to the hybridization mixture. 

It will be understood that where greater confidence 
in the absolute levels of expression is desired, multi- 
point calibrations may be performed. 

30 E. Analysis of Pat terns of Reporter Levels 

The output of the photomultiplier tube was digitized 
using a 12-bit RTI-835H analog-to-digital (A/D) conver- 
sion board (Analog Devices, Norwood, MA) installed in an 
IBM-compatible PC computer. The digitized data were 
35 displayed as an image where the signal intensity was 



20 



36 

mapped using a linear 20-color transformation to a 
pseudocolor scale ranging from blue (low signal) to red 
(high signal) . 

The data were also analyzed quantitatively. In 
5 cases where two different fluorophores were excited and 
measured simultaneously, the data were first corrected 
for optical crosstalk (due to overlapping emission 
spectra) between the fluorophores using each fluoro- 
phore's emission spectrum. 

10 A grid was superimposed over the fluorescence signal 

image such that the signal from each spot was centered in 
each element of the grid. The fluorescence signal within 
each element was then integrated to obtain a nvimerical 
value corresponding to the average intensity of the 

15 signal. The software used for the above analyses was 
similar in fvmctionality to "IMAGE-QUANT", available from 
Molecular Dynamics (Sunnyvale, CA) . 

Sxample X 

20 

Genomic-Comnl exitv Hybridization to Micro 
DNA Arrays Repres enting the Yeast 
Saccharomyces csi- svisia& Genome with 
TVO-Color Fluorescent Detection 

25 The array elements were randomly amplified PGR 

(Bohlander, et al., 1992) products using physically 
mapped lambda clones of s. cerevisiae genomic DNA tem- 
plates (Riles, et al., 1993). The PGR was performed 
directly on the lambda phage lysates resulting in an 

30 amplification of both the 35 kb lambda vector and the 5- 
15 kb yeast insert sequences in the form of a uniform 
distribution of PGR product between 250-1500 base pairs 
in length. The PGR product was purified using Sephadex 
G50 gel filtration (Pharmacia, Piscataway, NJ) and 

35 concentrated by evaporation to dryness at room tempera- 
ture overnight. Each of the 864 amplified lambda clones 



37 

was rehydrated in 15 ^1 of 3 x ssc in preparation for 
spotting onto th glass. 

The microarrays were fabricated on microscope slides 
coated with a layer of poly-l-lysine, as above. Immedi- 
ately prior to use, the immobilized DNA on the array was 
denatured in distilled water at 90" for 2 minutes. 

For the pooled chromosome experiment, the 16 chromo- 
somes of Saccharomyces cerevisiae were separated in a 
CHEF agarose gel apparatus (Biorad, Richmond, CA) . The 
six largest chromosomes were isolated in one gel slice 
and the smallest 10 chromosomes in a second gel slice. 
The DNA was recovered using a gel extraction kit (Qiagen, 
Chatsworth, CA) . The two chromosome pools were randomly 
amplified in a manner similar to that used for the target 
lambda clones. Following amplification, 5 micrograms of 
each of the amplified chromosome pools were separately 
random-primer labeled using Klenow polymerase (Amersheun, 
Arlington Heights, IL) with a lissamine conjugated 
nucleotide analog (Dupont NEN, Boston, MA) for the pool 
containing the six largest chromosomes, and with a 
fluorescein conjugated nucleotide analog (BMB) for the 
pool containing smallest ten chromosomes. The two pools 
were mixed and concentrated using an ultrafiltration 
device (Amicon, Oanvers, MA). 

Five micrograms of the hybridization probe consis- 
ting of both chromosome pools in 7.5 /tl of TE was de- 
natured in a boiling water bath and then snap cooled on 
ice. 2.5 nl of concentrated hybridization solution was 
added (final concentration 5 x ssc and 0.1% SDS) , and all 
10-^1 transferred to the array surface, covered with a 
cover slip, placed in a custom-built single-slide humi- 
dity chamber and incubated at 60* for 12 hours. Th 
slides^ were then rinsed at ro m temperature in 0.1 x ssC 
anc^As-l^^^f or 5 minutes , ^ e vor ^^l^ppM and scanned. 




38 

After correcting for optical crosstalk between the 
fluorophores due to their overlapping emission spectra, 
the red and green hybridization values for each clone on 
the array were correlated to the known physical map 
5 position of the clone resulting in a computer-generated 
color karyotype of the yeast genome, as shown in Figs. 3 A 
and 3B, discussed above. 

Example 2 

10 

Fluorescence Detection of Gene Expression Patterns 
using Micro Arravs of Arabldopsis c DNA Clones 

A. Mlcroarrav Preparation 

Target messenger nucleic acid DNA fragments were 

15 made by amplifying the gene inserts from 45 different 
AraJbidopsis thaliana cDNA clones and 3 control genes 
using the polymerase chain reaction (PCR; Mullis, et 
al . ) • The DNA fragments comprising the PCR product from 
each of the 48 reactions were purified using "QIAQUICK" 

20 PCR purification kits (Qiagen, Chatswqrth, CA) ^ eluted in 
ddHjO, dried to completion in a vacuxim centrifuge and 
resuspended in 15 ftl of 3X sodium chloride/ sodium citrate 
buffer (SSC) . The capacity of the "QIAQUICK" purifica- 
tion kits is 10 iig of DNA; accordingly, each sample 

25 contained about 10 fig or less of DNA. 

The samples were then deposited in individual wells 
of a 96 well storage plate with each sample split among 
two adjacent wells as a test of the reproducibility of 
the arraying and hybridization process. The samples were 

30 spotted on poly-*l-lysine-coated microscope slides, as 
(X^ aSeve? to produce a microarray with regions about 500 /im 
apart . 

The positions of several specific elements in the 
96-element array, and the reasons for their inclusion, 
35 are indicated in Table 3, below. The remaining elements 



39 

Of the array consist of known or unJcnovm genes selected 
from an Arabidopsis cDNA library. 

Table 3 

5 







irurpose 


1, 2 


Human acetylcholine 
receptor gene 


Control for 
expression level 


13, 14 


Chlorophyll binding 
protein gene 


Gene with known 
expression 


35, 36 


Rat glucocorticoid 
receptor gene 


Positive and 
negative control 


49, 50 


HAT4 transcription 
factor gene 


Gene with known 
expression 


95, 96 


Yeast TRP4 gene 


Positive and 
negative control 



B. Methods 

0^ Total^^^E^yftf RNA was isolated from plant tissue of 

Arcibidopsis using standard methods,, and was used to 
prepare cDNA labeled with either f luorescein-12-dCTP or 
lissamine-5-dCTP, as above. Hybridization and scanning 
20 were carried out as above. 

C. Two-color Detec tion of Differential Gene 
Expression in Wild Type versus Transgenic 
ArabidoDsis Tissue 

25 Differential gene expression was investigated using 

a simultaneous^ two-color hybridization scheme, which 
served to minimize experimental variation inherent in 
comparing independent hybridizations. Two fig of wild- 
type Arabidopsis total cDNA that were labeled with 

30 fluorescein (as above) were combined with two micrograms 
of transgenic Arabidopsis total cDNA that were labeled by 
incorporating lissamine-5-dCTP (DuPont NEN) in the 
reverse transcription step and hybridized simultaneously 



m 



40 

to a microarray containing the same pattern of spotted 
cONAs as described above. 

To test whether over express ion of a single gene 
could be detected in a pool of total Arabidopsis mRNA, 
5 methods of the invention were used to analyze a trans- 
genic line over expressing the transcription factor HAr4 
(Schena, et al., 1995). The transgenic Arabidopsis 
tissue was known to express HAT4 at levels of 0.5% of the 
total transcripts, while wild-type expression of HAT4 was 

10 only 0.01% of total transcripts (as previously determined 
by Northern analysis; Schena, et al., 1995). 

Human acetylcholine receptor mRNA was added to the 
wild- type Arabidopsis poly-A total mRNA sample at a 
weight ratio of 1:10,000 and into the transgenic 

15 Arabidopsis poly-A total mRNA sample at a weight ratio of 
1:100 to roughly match the expected expression levels of 
HAT4. 

As a cross-check of the negative controls, linear 
PGR was used to generate single-stranded fluorescein- 

20 labeled rat glucocorticoid receptor DNA and lissamine- 
labeled yeast TRP4 DNA. The two PGR products were added 
to the hybridization solution at a partial concentration 
of -1:100. The two fluorophores were excited separately 
in two separate scans in order to minimize optical 

25 crosstalk. 

The array was then scanned separately for fluor- 
escein and lissaunine emission following independent 
excitation of the two fluorophores as described in 
Example 2, above. The results of the experiments are 

30 shown in Fig. 4A and 4B. 

D. Two-color Detection of Differential Gene Ex- 
pression in Roo t versus Leaf Tissue 

35 In a similar experiment using the same labeling and 

hybridization procedures described above, 2 ng of total 



41 



cDNA from Arabidopsis root tissue labeled with fluor- 
escein were combined with two micrograms of total cDNA 
from Arabidopsis leaf tissue labeled with lissamine and 
0^ were simultaneously hybridized to al^^^^^r^ contain- 
Ol^^ the same^patt^^npf^ target sequences described above. 

The ^cj j A^oh^oi^n ogene mRNA was added to both poly-A 
total mRNA samples at 1:1,000 to allow for normalization 
of fluorescence intensities. The glucocorticoid and TRP4 
controls were added to the hybridization probe as before. 
10 The results are shown in Figs. 5A and 5B. 



15 



Example 3 

Differential Gene Expression Due to Heat shock- 
in Human T cells fJurkat Cell Line^ 



A. Constructing a Human Gene Expression Microarray 

Human cDNA clones were picked at random from a human 
peripheral blood cDNA library, and propagated as bac- 
20 terial cultures. The human cDNA library was made using 
mSNA isolated from h\iman peripheral . lym phoc ytes trans- 
£L- formed with the Epstein-Barr Virus (EBV) .^"ij^ei^ >600 
bases were cloned into the lambda vector lYES-R to 
generate 107-108 recombinants. Bacterial transf ormants 
diJS were obtained by inf ecting fiiii ^oli strain JM107/1KC. 

Colonies were picked, propagated in a 96-well format, and 
mini ly sate DMA was prepared by alkaline lysis using REAL 
preps (Qiagen) . 

PlasmidV)NA was isolated and inserts from each clone 
were amplifieli by use of the polymerase chain reaction 
(PGR) and puriMed. Inserts were amplified by PCR in a 
96-weil format using primers (PAN132, 5' 
CCTCTATACTTTAACGTfcAAGG; PAN133, 5' TTGTGTGGAATTGTGAGCGG) 
complementary t the lYES polylinker and containing a six 
35 carbon amino modifidation (Glen Research) on the 5' end. 
PCR products were purified in a 96-well format using 
QIAquick columns (QiagWn) . 




42 

A total of 1,056 purifiedpcj^ inserts representing 
1,046 human clones and 10^^! Si^idbpsa& controls were 
arrayed onto a 1.0 cm^ glass surface of a glass slide, and 
attached covalently to the slide surface as detailed 
5 above. 

B' Examining the Heat shock Response in Human 
Cells 

Hximan T (Jurkat) cells were grown in a tissue 
10 culture incubator (37«c and 5%A^€©3r) in RMPI medium 
supplemented with 10% fetal bovine serum, 100 jitg/ml 
streptomycin, and 500 U/ml penicillin. The cells were 
propagated to near confluence under normal growth condi- 
J tions, divided into two equal aliquots, and grown for 4 

ill 15 hours at 37»C and 43»c, respectively. Cells from the 
U control (37«»C) and heat shocked culture (43«»C) were 

W harvested, lysed (Ausubel, et al., 1994), and total mRNA 

W from the two cell samples was labeled by reverse tran- 

'J' scriptase incorporation of fluorescein- or Cy5-deriva- 

20 tized <^Pj, re^ectively. 
^ A^^ido|oia control mRNAs were made by in vitro 

m transcription of cloned HAT4, HAT22, and YesAt-23 cDNAs 

g (Cohen, et al., 1995; Chan, et al., 1994; Crabtree, et 

^ al,, 1994) using an RNA Transcription Kit (Stratagene) . 

25 For quantitation, the mRNAs were doped into the RT 
reaction at ratios of 1:100,000, 1:10,000, andJ jL^l^&O 
(w/w) respectively. 

Hybridizations and array scanning were carried out 
as above. To avoid complications arising from 
30 fluor-specific effects, the floxirs were "swapped" in a 
second set of labeling reactions, such that samples from 
control and heat shock-treated samples were labeled with 
Cy5- and f lucres cein-dCTP, respectively. Each pair of 
fluorescent probes Me»e mixed and hybridized to two 1,056 
35 element human gene expression microarrays. The arrays 



43 

were washed at high-stringency and scanned with a con- 
focal laser scanning device to detect emission of the two 
flours. 

Examination of the fluorescent scans revealed 
5 positive hybridization signals to >95% of the human cDNA 
(ly array elements, but not to any of the^^rt^fdigclo con- 
trols. Hybridization intensities spanned more than three 
orders of magnitude for the 1,046 array elements sur- 
veyed, comparative expression analysis of heat shocked 
10 versus control cells in the two experiments revealed 
altered fluorescence intensities at 17 array elements. 
Of the 17 putative differentially expressed genes, ii 
_ were induced by heat shock treatment and 6 displayed 

% modest repression. 

yj 15 

1 C. Identification of Heat-Shoek Related Genes 

yj Sequencing reactions were carried out using the 

Lj={ PAN132 and 133 primers and a 373A automated sequencer 

r according" to the instructions of the manufacturer (Ap- 

□ 20 plied Biosystems) . Sequence searches were made to the 
non-redundant nucleotide database at the National Center 
yi for Biotechnology Information (NCBI) using Macintosh 

=y Blast software. 

' Oi^ For dot-blot studies, samples ofl ^ l yA i^ mRNA (9) 

25 corresponding to 1.0, 0.1 and 0.01 ^g, respectively, were 
suspended in 10 x SSC, spotted onto nylon membranes 
(Nytran) , and crosslinked with ultraviolet light using a 
Stratalinker 1800 (Stratagene) . Probes were prepared 
from cloned sequences (Table l) by random priming using 
30 a ^ Prige^It II kit (Stratagene) in the presence of 

(k^ ^sa ^ aA TP . Hybridizations were carried out following the 

instructions of the manufacturer. Quantitation was 
p rformed on a Phosphorlmager (Molecular Dynamics) . 



35 



44 



Example 4 

Pifferential Gene Expression Due to P h orbol-Esi^ei> in 
Hwan T cells rjurka t Cell Llne^ 

To explore a signaling pathway distinct from the 

heat shock response, microarrays constructed as in 

Example 3 were used to examine the cellular effects of 

phorbol ester treatment. Jurkat cells were grown to near 

°°"uaB3f*a.?'-w2?*^®** ^^^^ phorbol ester, harvested, lysed 
and^ttsed so\arce of mRNA, as above. Samples of mRNA from 
untreated or phorbol ester-stimulated cells were labeled 
by reverse transcriptase incorporation of fluorescent 
dCTP analogs, as above. The two-color fluorescent probes 
were mixed, hybridized to microarrays, and scanned for 
fluorescence emission. A total of six array elements 
displayed elevated fluorescent signals with probes 
derived from phorbol ester-treated cells relative to a 
controls (Fig. 8). 

AltSough the invention has been described with 
respect to specific embodiments and methods, it will be 
clear that various changes and modification may be made 
without departing from the invention. 



